Goto

Collaborating Authors

 sign test 95


ROCS-Derived Features for Virtual Screening

arXiv.org Machine Learning

Ligand-based virtual screening is based on the assumption that similar compounds have similar biological activity [Willett, 2009]. Compound similarity can be assessed in many ways, including comparisons of molecular "fingerprints" that encode structural features or molecular properties [Todeschini and Consonni, 2009] and measurements of shape, chemical, and/or electrostatic similarity in three dimensions [Hawkins et al., 2007; Muchmore et al., 2006; Ballester and Richards, 2007]. Three-dimensional approaches such as rapid overlay of chemical structures (ROCS) [Hawkins et al., 2007] are especially interesting because of their potential to identify molecules that are similar from the point of view of a target protein but dissimilar in underlying chemical structure ("scaffold hopping"; [Böhm et al., 2004]). ROCS represents atoms as three-dimensional Gaussian functions [Grant and Pickup, 1995; Grant et al., 1996] and calculates similarity as a function of volume overlaps between alignments of pre-generated molecular conformers. Chemical ("color") similarity is measured by overlaps between dummy atoms marking interesting chemical functionalities: hydrogen bond donors and acceptors, charged functional groups, rings, and hydrophobic groups.


Molecular Graph Convolutions: Moving Beyond Fingerprints

arXiv.org Machine Learning

Molecular "fingerprints" encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular "graph convolutions", a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a simple encoding of the molecular graph---atoms, bonds, distances, etc.---which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.